Collaborators: Tomo Tanaka, John Eykelenboom.

Cartoon project explanation

shiny app

1 Data

In our data we distinguish four states, marked by colour: blue, brown, pink, red. For the purpose of the model we merge blue and brown together into one state. Here is an example from an experiment with 57 cells.

The next figure shows the proportions of each colour as a function of time. The curves are smoothed with a running mean over the window of 5 time points.

2 Model

The model consists of three states and a set of rules.

2.1 States

  • blue/brown (B)
  • pink (P)
  • red (R)

2.2 Rules

  • Simulation is carried out at a discrete time step of 1 min
  • There is a fixed time for nuclear envelope breakdown, \(t_0 = 0\); \(t_0\) can be changed if necessary
  • Time \(t_1\) is a random variable with exponential distribution with time scale \(\tau_1\)
  • Time \(\Delta t_2\) is a random variable with exponential distribution with time scale \(\tau_2\)
  • Time \(\Delta t_3\) is a random variable with exponential distribution with time scale \(\tau_3\)
  • The cell is in state B before time \(t_0-t_1\).
  • B\(\rightarrow\)P occurs after \(t_0-t_1\) with rate \(k_1\)
  • P\(\rightarrow\)R occurs after \(t_0-t_1 + \Delta t_2\) with rate \(k_2\)*
  • P\(\rightarrow\)B occurs after \(t_0-t_1 + \Delta t_3\) with rate \(k_3\)

* A switch, \(t_{2, ref}\) was introduced to the model, selecting the reference for time \(t_2\). In the default position (\(t_{2, ref} = 1\)) the B\(\rightarrow\)P activation time is \(t_0-t_1 + \Delta t_2\). When the switch is set to \(t_{2, ref} = 0\), the activation time is \(t_0 + \Delta t_2\), that is, it can only occur after the nuclear envelope breakdown. The idea of the switch was to separate pink and red curves, as observed in some data sets.

I use a Markov chain approach. The next state is generated from the current state based on rules outlined above. The rates, \(k\), are converted into probabilities over a given time step \(\Delta t\) as \(Pr = 1 - e^{1 - k\Delta t}\). The transition times, \(t_1\), \(t_2 = \Delta t_2\) and \(t_3 = \Delta t_3\), are generated before the simulation starts for a given cell.

The cell timeline is repeated for \(n_{\rm cell}\) times and then the colour proportions are found at each time point.

Note: though we experimented with varying \(t_0\) and a possibility of a “reversed” P\(\rightarrow\)B transition, they are not discussed in the paper. Also, we don’t discuss \(t_{2, ref}\) switch.

2.3 Example

Here is an example of the model.

And here is an example of the the model where P\(\rightarrow\)R transition can occur only after \(t_0 = 0\).

2.4 Parameter tuner

There is a shiny app that allows tuning parameters in search for the best solution.

3 Fitting model to the data

I attempted fitting model to our data. It is not an easy task, as the model is stochastic. After some testing I found that modified BFGS (a quasi-Newton method, also known as a variable metric algorithm, by Broyden, Fletcher, Goldfarb and Shanno, 1970) which allows box constraints (Byrd et al. 1995), gives reasonable results. Again, due to stochasticity of the model, this algorithm often finds false local minima. I run it several times to find the best minimum. It is a crude and time-consuming method, but gives results better than manual tuning.

Fitting is constrained to time points between -50 and 30 min. The minimized quantity is

\({\rm rms} = \sqrt{\sum_{c \in \{B,P,R\}} \sum_i (O_{c,i} - E_{c,i})^2}\)

that is, the square root of the sum of squared residuals over all time points and all colours.

3.1 Default model

First, we fit the model with \(\tau_1\), \(\tau_2\), \(k_1\) and \(k_2\) free. To improve the chances of finding the correct global minimum, I run the fit with 10,000 cells and 30 tries.

3.1.1 Scramble

We suspect that \(t_0\) might not be exactly zero. Hence I fit the model with \(t_0 = 0, -5, -10, -15\) minutes.

Here (and henceforward) I quote the snakemake rule (in Snakefile file) to create these results:

rule fit_scramble_t0

Let’s read and plot the results.

I think the issue here is not \(t_0\) but the fact that the red curve growths is too fast for the model to fully account for it.

3.1.2 All conditions

rule fit_all

Here are fit results for all conditions with \(t_0\) fixed at zero and four parameters free: \(\tau_1\), \(\tau_2\), \(k_1\) and \(k_2\). The RMS is computed over (-50, 30) min range (though the model is shown on the entire range).

3.2 Three parameters, \(\tau_2\) fixed at 8 min

We notice (see later) that \(\tau_2\) is poorly constrained and similar in all conditions. We fix it at 8 min and fit the model with three free parameters.

rule fit_all_3par

3.3 RAD21

RAD21 looks problematic. We want to see if it works with higher \(\tau_2\). Below, there is a comparison between the default 3-parameter fit (\(\tau_2\) fixed at 8 min) and a fit where \(\tau_2\) was fixed at 15 min.

rule fit_RAD21_t15

As we can see the model cannot reproduce RAD21 data because of the excess of pink state at early times. This is consistent with cohesin not been properly established due to RAD21 removal. This data set is beyond the scope of our model.

3.4 Switch: P\(\rightarrow\)R only after \(t_0\)

Here are results from a modified model, where P\(\rightarrow\)R transition can happen only after \(t_0\). By default \(t_0 = 0\) and the results are shown in the left panels below. As you can see, the red curve in the model lags behind the red curve in the data. Hence, I fitted the data again, but fixing \(t_0 = -10\) min this time (right panels). This aligns the red curves a little better.

We can also see that the pink curve is more peaked, as opposed to the smooth rise and decay in the default model.

rule fit_all_ref0
rule fit_all_ref0_t10

3.5 Confidence intervals on fit parameters

It is not easy to find confidence intervals in fit parameters when not only the model is non-analytic, but also stochastic. The only feasible approach it bootstrap, which is computationally expensive. Here I give it a try.

The bootstrap is performed by sampling with replacement from the data and fitting each sample. Then, we look at the distribution of bootstrapped fit parameters.

3.5.1 Bootstraps (4 parameters)

Running bootstraps on the cluster. This is quite slow!

rule boot_all

Once all is done, we read the results for selected sets.

Here is the distribution of bootstrap result. Each bootstrap gives a set of fit parameters. Figures below show the distribution of fit parameters across all bootstraps.

And here are fit parameters and their 95% confidence intervals:

These are box plots with whiskers encompassing 90% of data:

In this table we show the median, lower and upper 95% CI and the 25th and 75th percentiles of each parameter:

set HL1_med HL2_med k1_med k2_med HL1_lo95 HL2_lo95 k1_lo95 k2_lo95 HL1_up95 HL2_up95 k1_up95 k2_up95 HL1_p25 HL2_p25 k1_p25 k2_p25 HL1_p75 HL2_p75 k1_p75 k2_p75
untreated 12.5 6.48 0.0654 0.0448 11.1 4.56 0.0551 0.0338 14.5 9.92 0.0835 0.0577 11.9 5.26 0.0612 0.0409 13 7.66 0.0709 0.0497
scramble 12.8 6.29 0.0652 0.044 11.3 4.36 0.0562 0.0338 14.9 10.3 0.0783 0.0589 12.2 5.14 0.0619 0.0403 13.6 7.71 0.0701 0.0488
NCAPD2 11.9 7.16 0.074 0.0195 10.4 4.5 0.0597 0.0147 13.8 17.4 0.102 0.0311 11.4 6.49 0.0673 0.0174 12.5 9.67 0.0812 0.0219
NCAPD3 7.62 5.34 0.0394 0.0389 6.63 4.87 0.0335 0.0301 7.78 14.8 0.0459 0.0694 7.53 5.3 0.0373 0.0365 7.64 7.85 0.0415 0.0416
SMC2 7.43 4.38 0.0487 0.0201 5.59 3.43 0.0426 0.0143 8.18 15 0.0582 0.0354 6.85 4.2 0.0467 0.0178 7.66 5.44 0.0519 0.0229
RAD21 16.7 7.74 0.0777 0.0361 13.3 4.25 0.0531 0.0209 21.6 22 0.123 0.0892 15.5 5.68 0.0669 0.03 17.9 10.7 0.0913 0.0443
WAPL48 24.4 3.79 0.0082 0.00992 14.5 2.08 0.00576 0.00526 34.2 26.3 0.0122 0.0239 21.2 2.08 0.00714 0.0081 28.1 10.5 0.00934 0.0125
TT108 13.6 8.5 0.0746 0.0306 11.3 4.14 0.0527 0.0215 16.9 15.4 0.13 0.0478 12.7 6.55 0.0648 0.027 14.7 11.1 0.0929 0.0364

Here is link to the above table if you need to download it.

The next plot shows the correlation between \(\tau_2\) and \(k_2\) across bootstraps. The red line is a local regression line (LOESS).

3.5.2 Bootstraps (3 parameters, \(\tau_2 = 8\) min)

Here I perform bootstrap analysis with fixed \(\tau_2=8\) min.

rule boot_all_3par

Here is the distribution of bootstrap result.

And here are fit parameters and their 95% confidence intervals:

These are default box plots with whiskers encompassing 90% of data:

In this table we show the median value of each parameter:

set HL1 k1 k2
untreated 12.6 0.0636 0.0425
scramble 12.9 0.0648 0.0416
NCAPD2 11.8 0.0759 0.0173
NCAPD3 7.61 0.0397 0.0376
SMC2 7.31 0.0491 0.0205
RAD21 16.5 0.0761 0.0307
WAPL48 27.2 0.00765 0.00889
TT108 13.8 0.0726 0.0267

3.5.3 Bootstraps (3 parameters, \(\tau_2 = 0\) min)

Here I perform bootstrap analysis with fixed \(\tau_2=0\) min.

rule boot_all_3par0

Here is the distribution of bootstrap result.

And here are fit parameters and their 95% confidence intervals:

These are default box plots with whiskers encompassing 90% of data:

In this table we show the median value of each parameter:

set HL1 k1 k2
untreated 12.4 0.065 0.0354
scramble 12.8 0.0643 0.0352
NCAPD2 11.9 0.0748 0.0151
NCAPD3 7.52 0.04 0.0317
SMC2 7 0.0507 0.0183
RAD21 16.9 0.0719 0.0267
WAPL48 28.1 0.0075 0.00839
TT108 13.8 0.0724 0.0222

4 Four-colour plots

Here I create cell-line plots for all data sets using four colours.

PDF files:

link
untreated
scramble
NCAPD2
NCAPD3
SMC2
WAPL24
WAPL48
MK1775
MK1775_ICRF193
RAD21
TT103
TT108

5 Brown density

This work is not included in the paper.

In this Section we consider the distribution of brown or pink boxes. Hereafter, I call them “brown”, for simplicity, but we do take both brown and pink boxes into account.

Note on terminology: I refer to individual boxes in cell tracks as “boxes” or “points”. I refer to chains of consecutive boxes of the same type as “events”.

5.1 Point-to-point interarrival distributions

Here I calculate the density of brown boxes and the distribution of inter-arrival times between them. If brown points are distributed randomly with density \(\lambda\), the intervals between them should follow the exponential distribution with PDF

\(f(d) = \lambda e^{-\lambda d}\)

where \(d\) is the inter-arrival time between consecutive brown points. Because some of the cells have much shorter data tracks, they will bias the inter-arrival distribution towards shorter intervals. The longer distances are missing from these short tracks. To account for this I made a simple model, in which I take cell tracks from the actual data, fill them randomly with brown points and calculate inter-arrival distribution. This is done 100 times to smooth the distribution.

The example below shows the result for scramble. The black bars represent data, the orange points show the predicted theoretical distribution \(f(d) = \lambda e^{-\lambda d}\) and the open blue bars show the simulated random distribution. As we can see, the simulated distribution predicts more short intervals with respect to the theoretical distribution, as expected.

I use a window of [-300, -20] min.

This plot has limited interpretation. Inter-arrival time between individual one-minute boxes assumes that each box is one event. But in reality, brown event can last less or more than one minute. If we see, e.g., four brown boxes next to each other, it is unlikely that this shows four one-minute events. Much more likely, this is one event that lasts four minutes.

5.2 Brown event duration distribution

Instead, we should consider the distribution of full brown events, short and long. Below, I show the distribution of duration of brown events. Before the durations are calculated, I fill the occasional missing data with random brown or blue points, where brown probability is the same as the overall brown density in the data. This means that this figure is slightly random and the size of bars will change from one instance to another. The differences are rather small, though (no more than a few percent of the bar size).

The blue open bars show the expected distribution of duration of events derived from a random distribution of single brown points (noise). It shows that longer events (3 minutes or longer) are over-represented, by comparison with random noise. Brown occurrences cannot be explained by pure noise, they do form longer non-random “events”.

5.3 Brown event gap distribution

We can get more insight from the distribution of gaps between the brown events. Here I use the gaps between the events, not taking brown begin/end into account. For example, if “+” is brown/pink and “-” is blue (or red) then this cell track

---+--++++---+-

shows three brown events of length 1, 4, and 1 and two “gap” events of lengths 2 and 3. The blue sequences at the beginning and at the end are ignored, because we don’t know how long they are, we only see part of them.

The figure below shows the distribution of gaps between brows events. The simulated distribution (open bars) is calculated in the following way:

  • First, I fill missing data with random brown points with density \(\lambda\). Note that this is rather arbitrary and adds mostly one-minute points. On the other hand these missing points are not frequent, so the it doesn’t make huge difference.
  • I count the brown events of each length
  • I generate random data using the same cell track durations and the same brown events, distributed randomly
  • The process is repeated 100 times and the mean distribution of randomized data is shown
  • From this I find the distribution of gap length between events

If brown events were random, both data and simulated distributions would be similar. The p-value in the title shows the result from a chi-square test comparing data and simulation. The number “d” in each title is the density of brown boxes (\(\lambda\)).

As we can see, there is an excess of short gaps in comparison to a random distribution. This means that brown events tend to cluster together. In particular, the 1-min gaps are more prominent, suggesting that one brown event might trigger another, immediately after it. On the other hand, this is our timing resolution, so, to some extent, it might be an experimental artifact. For example, if we have a long brown event and one box in the middle is misidentified as blue, than it creates an artificial gap of size 1. If this happens a few times, we get the observed pattern.

Below, are results for all conditions. Note: each plot shows a p-value from a chi-square test between the data and the model shown. These p-values are typically very small. For clarity, I replaced each \(p < 10^{-16}\) with zero.

5.4 Distribution comparison

Here I compare event duration and gap duration distribution for each pair of conditions. I use chi-square test and the null hypothesis is that the distribution does not depend on the sample.

5.4.1 Point-to-point interval

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108
untreated
0.48 0.26 0.49 0.19 0.88 0.021 0.49 0.85 4e-05 0.53 0.12
scramble 0.48
0.0026 0.82 0.12 0.095 0.42 0.057 0.48 0.12 0.12 0.26
NCAPD2 0.26 0.0026
0.47 0.21 0.51 3.6e-05 0.69 0.19 1.2e-10 0.35 0.086
NCAPD3 0.49 0.82 0.47
0.84 0.37 0.0076 0.38 0.7 4.8e-05 0.73 0.15
SMC2 0.19 0.12 0.21 0.84
0.46 0.0039 0.27 0.33 1.6e-07 0.64 0.12
WAPL24 0.88 0.095 0.51 0.37 0.46
0.012 0.77 0.78 1e-06 0.93 0.47
WAPL48 0.021 0.42 3.6e-05 0.0076 0.0039 0.012
0.0052 0.26 0.26 0.054 0.07
MK1775 0.49 0.057 0.69 0.38 0.27 0.77 0.0052
0.28 5.6e-05 0.72 0.41
MK1775_ICRF193 0.85 0.48 0.19 0.7 0.33 0.78 0.26 0.28
0.0022 0.94 0.91
RAD21 4e-05 0.12 1.2e-10 4.8e-05 1.6e-07 1e-06 0.26 5.6e-05 0.0022
2e-07 0.00019
TT103 0.53 0.12 0.35 0.73 0.64 0.93 0.054 0.72 0.94 2e-07
0.39
TT108 0.12 0.26 0.086 0.15 0.12 0.47 0.07 0.41 0.91 0.00019 0.39

5.4.2 Event duration

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108
untreated
0.65 0.43 0.48 0.2 0.37 0.098 0.5 0.87 0.29 0.82 0.18
scramble 0.65
0.16 0.32 0.0094 0.21 0.52 0.2 0.41 0.077 0.029 0.038
NCAPD2 0.43 0.16
0.77 0.64 0.71 0.11 0.58 0.24 0.0027 0.42 0.42
NCAPD3 0.48 0.32 0.77
0.52 0.79 0.39 0.53 0.56 0.15 0.71 0.66
SMC2 0.2 0.0094 0.64 0.52
0.23 0.025 0.32 0.31 0.0013 0.53 0.85
WAPL24 0.37 0.21 0.71 0.79 0.23
0.088 0.5 0.78 0.0068 0.39 0.26
WAPL48 0.098 0.52 0.11 0.39 0.025 0.088
0.2 0.13 0.11 0.015 0.33
MK1775 0.5 0.2 0.58 0.53 0.32 0.5 0.2
0.42 0.047 0.27 0.47
MK1775_ICRF193 0.87 0.41 0.24 0.56 0.31 0.78 0.13 0.42
0.25 0.48 0.3
RAD21 0.29 0.077 0.0027 0.15 0.0013 0.0068 0.11 0.047 0.25
0.041 0.018
TT103 0.82 0.029 0.42 0.71 0.53 0.39 0.015 0.27 0.48 0.041
0.11
TT108 0.18 0.038 0.42 0.66 0.85 0.26 0.33 0.47 0.3 0.018 0.11

5.4.3 Gaps between events

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108
untreated
0.45 0.53 0.71 0.45 0.86 0.24 0.45 0.73 0.02 0.66 0.13
scramble 0.45
0.35 0.94 0.26 0.16 0.58 0.14 0.64 0.86 0.42 0.36
NCAPD2 0.53 0.35
0.56 0.56 0.66 0.26 0.76 0.53 0.09 0.46 0.3
NCAPD3 0.71 0.94 0.56
0.95 0.48 0.13 0.15 0.69 0.13 0.68 0.27
SMC2 0.45 0.26 0.56 0.95
0.53 0.34 0.29 0.41 0.011 0.61 0.096
WAPL24 0.86 0.16 0.66 0.48 0.53
0.39 0.6 0.78 0.012 0.84 0.13
WAPL48 0.24 0.58 0.26 0.13 0.34 0.39
0.2 0.73 0.52 0.92 0.52
MK1775 0.45 0.14 0.76 0.15 0.29 0.6 0.2
0.45 0.079 0.24 0.5
MK1775_ICRF193 0.73 0.64 0.53 0.69 0.41 0.78 0.73 0.45
0.51 0.99 0.69
RAD21 0.02 0.86 0.09 0.13 0.011 0.012 0.52 0.079 0.51
0.16 0.14
TT103 0.66 0.42 0.46 0.68 0.61 0.84 0.92 0.24 0.99 0.16
0.52
TT108 0.13 0.36 0.3 0.27 0.096 0.13 0.52 0.5 0.69 0.14 0.52

These tables show that there is very little discernible difference between conditions. Only TT103 differs from some other conditions in gap distribution. Otherwise, we have no evidence to reject the null hypothesis (but we cannot accept it either!).

5.5 S and late G2 data

5.5.1 Distribution comparison for point-to-point interval

WARNING: S and G2 data are calculated on a 2-min grid, while other sets are based on 1-min grid. To compare them, I bin their distributions from a smaller grid. I don’t bin raw data, as there is a problem of the reference frame. Binning 1-2, 3-4, 5-6, … will give different results than 2-3, 4-5, 6-7, …

The distributions are easier to compare. Consider a 1-min grid distribution where we have 50 1-min events and 20 2-min events. I bin them together into 50+20=70 events and compare to the number of 2-min events of S or G2 (which contains all events shorter than 2 minutes).

Depending on how the actual brown events are distributed (their true duration and timing) comparing data based on different time grids might introduce unpredictable biases.

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108 S G2
S 2.5e-05 2.5e-11 0.045 0.0002 1.3e-05 0.02 2.7e-08 0.24 0.00016 6.2e-20 0.0007 0.00072
0.24
G2 0.008 4.7e-06 0.084 0.05 0.023 0.056 2.8e-06 0.23 0.061 6.4e-13 0.33 0.054 0.31

5.5.2 Distribution comparison for event duration

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108 S G2
S 0.015 0.044 7.6e-05 0.0094 0.00021 0.003 0.18 0.035 0.077 0.62 0.00044 0.027
0.26
G2 0.00066 0.0051 1e-08 9.2e-05 5.6e-09 8.3e-05 0.051 0.0088 0.024 0.027 6.2e-07 1.4e-05 0.31

5.5.3 Distribution comparison for gaps between events

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108 S G2
S 0.00042 1.6e-07 0.004 0.00027 0.00095 0.029 0.0049 0.15 0.00011 1.7e-09 8.1e-05 0.0011
0.1
G2 0.77 0.013 0.43 0.68 0.36 0.16 0.12 0.41 0.053 0.00032 0.27 0.07 0.3

6 Figures

All figures are in the this folder.

## png 
##   2
## png 
##   2
## png 
##   2